Skip to content

Conversation

@sgeraldes
Copy link

@sgeraldes sgeraldes commented Jan 20, 2026

Problem

Spec creation was failing with max_tokens: 65537 > 64000 error for Claude Opus 4.5. The codebase had magic numbers scattered throughout (62000, 63999, 64000) with no validation against model-specific limits.

Root Cause

  1. Magic numbers: Token limits hardcoded in multiple files
  2. No model validation: Thinking budgets not validated against model-specific limits
  3. SDK bug workaround needed: Issue #8756 - SDK sometimes reduces max_tokens without adjusting thinking budget

Solution

Created a comprehensive model-specific configuration system:

Changes

  • apps/backend/model_limits.json: New configuration file with all Claude 4.5 model limits

    • All models: 64K max_output_tokens (Opus, Sonnet, Haiku 4.5)
    • Safe thinking budget: 60K tokens (leaves 4K buffer for SDK overhead)
    • Documents validation rules and SDK bug workaround
  • apps/backend/phase_config.py: Load limits from config, add validation

    • get_model_max_output_tokens(): Get model's max_tokens limit
    • get_model_max_thinking_tokens(): Get safe thinking budget
    • validate_thinking_budget(): Caps budgets to model limits with warnings
    • All thinking budget calls now validate against model limits
  • apps/frontend/src/shared/constants/models.ts: Add model limit constants

    • MODEL_OUTPUT_LIMITS: 64K for all models
    • MODEL_MAX_THINKING: 60K safe limit
    • Updated THINKING_BUDGET_MAP ultrathink to 60K
  • tests/test_model_limits.py: New comprehensive tests (10 tests)

    • Validates all models have correct limits
    • Tests budget capping for excessive values
    • Tests API constraint (thinking < max_tokens)
    • Tests 4K+ buffer for SDK overhead
  • tests/test_thinking_level_validation.py: Updated ultrathink budget to 60K

Technical Details

  • API constraint: max_tokens > thinking.budget_tokens (strictly greater)
  • All Claude 4.5 models: 64K max output, 200K context window
  • Safe thinking budget: 60K tokens (4K buffer for SDK overhead)
  • Graceful degradation: Warns and caps excessive budgets instead of failing

Why 60,000 instead of 63,999?

  • Model limit: 64,000 max_tokens
  • SDK overhead: ~4,000 tokens buffer needed
  • Safe budget: 60,000 tokens
  • Prevents max_tokens: 65537 > 64000 error

Testing

✅ All 19 tests pass (9 existing + 10 new)

  • Validates budget capping
  • Validates API constraints
  • Validates buffer requirements
  • Backward compatibility maintained

References

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Model-specific token limits now enforced across the system (64,000 output, 60,000 thinking budget)
    • Added validation to cap thinking budgets within model constraints and log warnings when limits are exceeded
  • Tests

    • Added comprehensive test coverage for model limits and thinking budget validation

✏️ Tip: You can customize this high-level summary in your review settings.

Problem:
- Spec creation failing with "max_tokens: 65537 > 64000" error for Opus 4.5
- Magic numbers scattered across codebase (62000, 63999, 64000)
- No validation against model-specific limits
- SDK bug #8756 causes intermittent validation errors when max_tokens
  is reduced without adjusting thinking budget

Solution:
- Created model_limits.json configuration file with all Claude 4.5 model limits
- All models have 64K max_output_tokens (Opus, Sonnet, Haiku 4.5)
- Set ultrathink budget to 60K (leaves 4K buffer for SDK overhead)
- Added validation functions to cap thinking budgets to model limits
- Updated frontend constants to match backend configuration
- Added comprehensive tests for model-specific validation

Changes:
- apps/backend/model_limits.json: New configuration file with model limits
- apps/backend/phase_config.py: Load limits from config, add validation
- apps/frontend/src/shared/constants/models.ts: Add model limit constants
- tests/test_model_limits.py: New tests for model-specific validation (10 tests)
- tests/test_thinking_level_validation.py: Update ultrathink budget to 60K

Technical Details:
- API constraint: max_tokens > thinking.budget_tokens (strictly greater)
- All Claude 4.5 models: 64K max output, 200K context window
- Safe thinking budget: 60K tokens (4K buffer for SDK overhead)
- Graceful degradation: Warns and caps excessive budgets instead of failing

Testing:
- All 19 tests pass (9 existing + 10 new)
- Validates budget capping, API constraints, and buffer requirements

Fixes SDK bug workaround for issue anthropics/claude-code#8756

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Sebastian Geraldes <199673787+sebastiangeraldes@users.noreply.github.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sgeraldes, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical issues related to hardcoded token limits for Claude models, which were causing max_tokens errors. By introducing a centralized, model-specific configuration system, the codebase now dynamically manages token limits and thinking budgets, ensuring robust validation and preventing runtime failures. The changes provide a more flexible and resilient approach to handling varying model constraints, improving the overall stability and maintainability of the system.

Highlights

  • Centralized Model Configuration: Introduced a new configuration file, apps/backend/model_limits.json, to centralize and define model-specific token limits, including max output tokens, context window, and safe thinking budgets for Claude 4.5 models.
  • Dynamic Token Limit Validation: Implemented functions in apps/backend/phase_config.py to dynamically load model limits and validate thinking budgets against these limits, preventing API errors caused by exceeding model-specific token constraints. This includes a graceful degradation mechanism that warns and caps excessive budgets.
  • Frontend and Backend Synchronization: Updated frontend constants in apps/frontend/src/shared/constants/models.ts to reflect the new model-specific output and thinking limits, ensuring consistency across the application.
  • Comprehensive Test Coverage: Added a new test suite in tests/test_model_limits.py with 10 new tests to rigorously validate the model limit configuration, budget capping logic, API constraints, and the necessary buffer for SDK overhead. Existing tests were also updated to reflect the new ultrathink budget.
  • SDK Bug Workaround: Incorporated a 4K token safety buffer into the maximum thinking budget (60K instead of 64K) to mitigate issues arising from an SDK bug where max_tokens might be reduced without corresponding adjustments to the thinking budget.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

A new centralized model limits configuration (model_limits.json) is added to define token constraints and thinking budgets for Claude model variants. The backend (phase_config.py) is updated to load and enforce these limits dynamically, with new validation functions. Frontend constants are synchronized with backend values, and comprehensive tests validate the model-aware budgeting behavior.

Changes

Cohort / File(s) Summary
Configuration & Model Limits
apps/backend/model_limits.json
New JSON configuration defining model variants (Claude 4.5 families), max output tokens (64,000), max thinking tokens (60,000), thinking levels with budgets (none/low/medium/high/ultrathink at 60,000), and validation rules for token constraints.
Backend Model-Aware Budgeting
apps/backend/phase_config.py
Integrated model limits loading with _load_model_limits(); added functions get_model_max_output_tokens(), get_model_max_thinking_tokens(), and validate_thinking_budget(); refactored get_thinking_budget(), get_phase_thinking_budget(), get_phase_config(), and get_spec_phase_thinking_budget() to accept optional model_id parameter and enforce model-specific constraints; replaced static THINKING_BUDGET_MAP with dynamic derivation from _MODEL_LIMITS.
Frontend Model Constraints
apps/frontend/src/shared/constants/models.ts
Added MODEL_OUTPUT_LIMITS (all models: 64,000) and MODEL_MAX_THINKING (all models: 60,000) exports; updated THINKING_BUDGET_MAP ultrathink value from 63,999 to 60,000 with clarifying comment.
Model Limits Validation Tests
tests/test_model_limits.py
New comprehensive test suite validating model output/thinking token limits, budget validation behavior, model-aware capping of excessive budgets, backward compatibility, and SDK buffer requirements (4,000-token minimum gap between max output and max thinking).
Thinking Level Test Updates
tests/test_thinking_level_validation.py
Updated ultrathink budget expectation from 63,999 to 60,000 in two test assertions; revised documentation comment explaining new 4K SDK buffer rationale.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • Issue #1323 — Addresses enforcement of ultrathink thinking budget limits in phase_config.py validation logic.
  • Issue #1212 — Addresses ultrathink budget value alignment between phase_config.py and frontend constants.
  • Issue #1218 — Modifies thinking-budget logic affecting both backend phase_config and frontend model constants.

Possibly related PRs

  • PR #1284 — Also updates ultrathink thinking-budget configuration in the same backend/frontend constants area.
  • PR #1173 — Directly modifies ultrathink maximum to 60,000 across phase_config and test files.

Suggested labels

area/fullstack

Poem

🐰 A rabbit hops through limits neat,
Sixty thousand tokens, ultrathink's treat!
From JSON config to budgets tight,
Models constrained with thinking just right,
Four-K buffer keeps SDK flight! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: replacing hardcoded token limits with model-specific configuration, which is the core objective and primary change across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Thanks for your first PR!

A maintainer will review it soon. Please make sure:

  • Your branch is synced with develop
  • CI checks pass
  • You've followed our contribution guide

Welcome to the Auto Claude community!

@sentry
Copy link

sentry bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 85.36585% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
apps/backend/phase_config.py 85.36% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent improvement that addresses the problem of hardcoded token limits by introducing a model-specific configuration system. The new model_limits.json file centralizes all model constraints, and the backend code in phase_config.py is cleanly refactored to load, validate, and gracefully handle these limits. I particularly appreciate the addition of comprehensive tests in tests/test_model_limits.py, which ensure the new logic is robust and correct. My only suggestion for improvement relates to the duplication of model limits on the frontend, which could be addressed to further improve maintainability.

Comment on lines +35 to 53
// Model-specific output token limits (all Claude 4.5 models have 64K max_tokens)
export const MODEL_OUTPUT_LIMITS: Record<string, number> = {
'claude-opus-4-5-20251101': 64000,
'claude-sonnet-4-5-20250929': 64000,
'claude-haiku-4-5-20251001': 64000,
opus: 64000,
sonnet: 64000,
haiku: 64000
} as const;

// Maximum safe thinking budget for each model (leaves buffer for SDK overhead)
export const MODEL_MAX_THINKING: Record<string, number> = {
'claude-opus-4-5-20251101': 60000,
'claude-sonnet-4-5-20250929': 60000,
'claude-haiku-4-5-20251001': 60000,
opus: 60000,
sonnet: 60000,
haiku: 60000
} as const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While centralizing the model limits on the backend is a great step, these new constants (MODEL_OUTPUT_LIMITS and MODEL_MAX_THINKING) duplicate the configuration from apps/backend/model_limits.json. This creates two sources of truth and could lead to inconsistencies if limits are updated in one place but not the other.

To improve maintainability and ensure consistency, consider creating a backend API endpoint that exposes these model limits. The frontend could then fetch this configuration when the application loads. This would make model_limits.json the single source of truth for the entire application.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/backend/phase_config.py (1)

1-17: Ruff format check is failing for apps/backend.

CI reports one file needs formatting. Please run ruff format apps/backend/ --quiet and commit the changes.

tests/test_thinking_level_validation.py (1)

13-13: Fix import path casing to use lowercase "apps".

Line 13 uses "Apps" but the actual directory is lowercase "apps". On case-sensitive filesystems this will fail. Please align with the repository structure and other tests like test_model_limits.py.

Proposed fix
- sys.path.insert(0, str(Path(__file__).parent.parent / "Apps" / "backend"))
+ sys.path.insert(0, str(Path(__file__).parent.parent / "apps" / "backend"))
🤖 Fix all issues with AI agents
In `@apps/backend/phase_config.py`:
- Around line 25-31: The _load_model_limits function currently builds the file
path via Path(__file__).parent / "model_limits.json" (limits_file); update it to
use the project’s platform abstraction path API instead: import the platform
abstraction module and replace the limits_file construction with the platform’s
path/resource helper (e.g., platform.join_path or platform.get_resource_path) so
path handling follows backend guidelines while keeping the same open(...,
encoding="utf-8") call and return json.load(f).

Comment on lines +25 to +31
# Load model limits from configuration file
def _load_model_limits() -> dict:
"""Load model limits from model_limits.json."""
limits_file = Path(__file__).parent / "model_limits.json"
try:
with open(limits_file, encoding="utf-8") as f:
return json.load(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use the platform abstraction module for path handling.

Line 28 constructs paths directly via Path. Per backend guidelines, route path handling through the project’s platform abstraction module.

🤖 Prompt for AI Agents
In `@apps/backend/phase_config.py` around lines 25 - 31, The _load_model_limits
function currently builds the file path via Path(__file__).parent /
"model_limits.json" (limits_file); update it to use the project’s platform
abstraction path API instead: import the platform abstraction module and replace
the limits_file construction with the platform’s path/resource helper (e.g.,
platform.join_path or platform.get_resource_path) so path handling follows
backend guidelines while keeping the same open(..., encoding="utf-8") call and
return json.load(f).

@AndyMik90 AndyMik90 self-assigned this Jan 20, 2026
Copy link
Owner

@AndyMik90 AndyMik90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto Claude PR Review

Merge Verdict: 🔴 BLOCKED

🔴 Blocked - 2 CI check(s) failing. Fix CI before merge.

Blocked: 2 CI check(s) failing. Fix CI before merge.

Risk Assessment

Factor Level Notes
Complexity Medium Based on lines changed
Security Impact None Based on security findings
Scope Coherence Good Based on structural review

🚨 Blocking Issues (Must Fix)

  • CI Failed: Lint Complete
  • CI Failed: Python (Ruff)

Findings Summary

  • Low: 3 issue(s)

Generated by Auto Claude PR Review

Findings (3 selected of 3 total)

🔵 [078e01afefcd] [LOW] [Potential] Frontend/backend configuration duplication requires manual sync

📁 apps/frontend/src/shared/constants/models.ts:26

Token limits (64000, 60000) and thinking level budgets are defined in both apps/backend/model_limits.json (source of truth) and apps/frontend/src/shared/constants/models.ts (lines 27-53). While a comment on line 26 documents the sync requirement, there's no automated validation. This is an acceptable trade-off given the complexity of sharing config between Python/TypeScript, but creates future maintenance burden.

Suggested fix:

Consider adding a CI test that compares frontend constants against backend JSON to catch drift. Alternatively, document the sync requirement more prominently in both files.

🔵 [d6f9e65c82a1] [LOW] [Potential] Cross-reference comment uses outdated path

📁 apps/backend/phase_config.py:54

Comment references 'auto-claude-ui/src/shared/constants/models.ts' but the actual frontend path is 'apps/frontend/src/shared/constants/models.ts'. This appears to be a legacy project name.

Suggested fix:

Update comment from 'auto-claude-ui/' to 'apps/frontend/'

🔵 [ea8f8d448fce] [LOW] [Potential] Missing tests for JSON file loading failure scenarios

📁 tests/test_model_limits.py:1

The _load_model_limits() function in phase_config.py handles FileNotFoundError and JSONDecodeError with fallback defaults (lines 32-48), but the test file has no coverage for these error paths. If fallback values are accidentally modified, no test would catch the regression.

Suggested fix:

Add tests that mock file loading failures: (1) Use unittest.mock.patch('builtins.open') to simulate FileNotFoundError, (2) Verify fallback dict is returned, (3) Verify warning is logged.

This review was generated by Auto Claude.

@sgeraldes
Copy link
Author

Comment is not valid. Too much code for a quick patch. Already added tests for a set of hardcoded magic numbers that got added as configurations instead. Simplify and implement quickly.

@sgeraldes
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@AndyMik90 AndyMik90 force-pushed the develop branch 2 times, most recently from 67a743f to e83e445 Compare January 21, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants